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^^ I Abstract. The statistical picture of the solution space for a binary perceptron is 

studied. The binary perceptron learns a random classification of input random patterns 
by a set of binary synaptic weights. The learning of this network is difficult especially 
when the pattern (constraint) density is close to the capacity, which is supposed to be 
intimately related to the structure of the solution space. The geometrical organization 

' O . is elucidated by the entropy landscape from a reference configuration and of solution- 

pairs separated by a given Hamming distance in the solution space. We evaluate the 



C^ 



rj • entropy at the annealed level as well as replica symmetric level and the mean field 

result is confirmed by the numerical simulations on single instances using the proposed 
message passing algorithms. 
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1. Introduction 

Learning in a single layer of feed forward neural network with binary synapses has 
been studied either based on statistical mechanics analysis [U |2l |3l H] or in algorithmic 
aspects [3 El El m [HI [ini im [l2] • This network can learn an extensive number P = aN 
of random patterns, where N is the number of synapses and a denotes the constraint 
density. The critical a (also called the capacity) separating the learnable phase from 
unlearnable phase is predicted to be as — 0.833 where the entropy vanishes [I]. A 
solution is defined as a configuration of synaptic weights to implement the correct 
classification of P random input patterns. Above ««, no solutions can be found with 
high probability (converging to 1 in the thermodynamic limit). The replica symmetric 
solution presented in Ref. [Ij has been shown to be stable up to the capacity, which 
is in accordance with the convexity of the solution space p]. Note that the solutions 
disappear at the threshold a^ still maintaining a typical finite value of Hamming distance 
between them, which is quite distinct from the case in the continuous perceptron with 
real-valued synapses. In the continuous perceptron, this distance tends to zero when 
the solutions disappear at the corresponding threshold [13]. On the other hand, many 
local search algorithms [HI [71 [HI [12] were proposed to find solutions of the perceptron 
learning problem, however, the search process slows down with increasing a, and the 
critical a for the local search algorithm [12] decreases when the number of synapses 
increases. This typical behavior of the stochastic local search algorithm is conjectured 
to be related to the geometrical organization of the solution space [3, [1] • In order to 
acquire a better understanding for the failure of the local search strategy, we compute 
the entropy landscape both from a reference configuration and for solution-pairs with a 
given distance in the solution space. Both distance landscapes contain rich information 
about the detailed structure of the solution space and then can help us understand the 
observed glassy behavior of the local search algorithms. Throughout the paper, the 
term distance refers to the Hamming distance. The distance landscape has been well 
studied in random constraint satisfaction problems defined on diluted or sparse random 
graphs [Tl[l5l[T6l[n]. 

Learning in the binary perceptron can be mapped onto a bipartite graph where 
variable node represents synaptic weight and function node represents the input random 
pattern to be learned (see figure [1] (b)). This graph is also called graphical model 
or factor graph [18]. The efficient message passing learning algorithm for the binary 
perceptronal learning problem has been derived using the cavity method and this factor 
graph representation p]. In this paper, we focus on the typical property of the solution 
space in random ensembles of the binary perceptronal learning problem. We apply the 
replica trick widely used to study disordered systems [19] to compute the statistical 
properties in the thermodynamic limit. To confirm the mean field result computed 
using the replica approach, we derive the message passing equations in the cavity context 
which can be applied on random single instances of the current problem. In this context, 
we apply the decorrelation assumption as well as the central limit theorem to derive the 
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Figure 1. The sketch of the binary perceptron and the factor graph representation. 
(a) N input units (open circles) feed directly to a single output unit (solid circle). A 
binary input pattern (i^^, S.2 , ■ ■ ■ ,^n) "^^ length N is mapped through a sign function to 
a binary output a'^, i.e., a'^ — sgn(^.^^ Ji^i)- The set of N binary synaptic weights 
{Ji} is regarded as a solution of the perceptron problem if the output a'^ = a^ for each 
of the P — aN input patterns /i £ [1, P], where ctq is a preset binary value, (b) Each 
circle denotes the variable node whose value takes Ji with i its index. The square is 
the function node denoting a random binary pattern to be learned. If the pattern is 
learned by the synaptic vector J, the value of the corresponding function node takes 
zero. The dotted line represents other P — 4 function nodes while the dashed line for 
variable node i means i is connected to other P — 4 function nodes and that for function 
node means the function node (e.g., h) is connected to other A^ — 3 variable nodes. 



formula at the replica symmetric level. This assumption arises from the weak correlation 
among synaptic weights (within one pure state p^) and among input patterns |20j. The 
efficiency of the inspired message passing algorithms in loopy systems has been observed 
in Refs. [2T|, [22l [9|, [23] while the underlying mechanism still needs to be fully understood. 
However, our cavity method focuses on the physical content and yields the same result 
as that obtained using replica approach [2ni El]- 

The remainder of this paper is organized as follows. The random classification by 
the binary perceptron is defined in Sec. El In Sec. [3l we derive the self-consistent 
equations to compute the distance landscape (entropy landscape) from a reference 
configuration, i.e., to count the number of solutions at a distance from the reference 
configuration. Both the annealed and replica symmetric (RS) computations of this 
entropy landscape are presented. We also derive the message passing equations for 
single instances in this section using the cavity method and factor graph representation. 
In Sec. m the landscape of Hamming distances between pairs of solutions is evaluated 
at both annealed approximation and RS ansatz, and the associated message passing 
equations are proposed as well. Discussion and conclusion are given in Sec. [5l 



2. Problem definition 

The binary perceptron realizes a random classification of P random input patterns 
(see figure [11(a)). To be more precise, the learning task is to find an optimal set of 
binary synaptic weights (solution) {Jj}^^ that could map correctly each of random 
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input patterns {^f }(yU = 1, . . . , P) to the desired output a^ which is assigned a value ±1 
at random. P is proportional to A^ with the coefficient a defining the constraint density 
(each input pattern serves as a constraint for all synaptic weights, see figure [H (b)). 
The critical value is a^ ~ 0.833 below which the solution space is non-empty [1]. Given 
the input pattern ^^, the actual output a'^ of the perceptron is a^ = sgn (X]i=i Ji^i) 
where Jj takes ±1 and ^f takes ±1 with equal probabilities. If a'^ = ctq , we say that the 
synaptic weight vector J has learned the /x-th pattern. Therefore we define the number 
of patterns mapped incorrectly as the energy cost 

N 



v^ „ / < 



where Q(x) is a step function with the convention that 0(a;) = if z < and Q{x) = 1 
otherwise. The prefactor N'^'"^ is introduced to ensure that the argument of the step 
function remains at the order of unity, facilitating the following statistical mechanical 
analysis. In the current setting, both {^f } and the desired output {ctq} are generated 
randomly independently. Without loss of generality, we assume ctq = +1 for any input 
pattern in the remaining part of this paper, since one can perform a gauge transformation 
ii ~^ ii^o ^o QS-ch. input pattern without affecting the result. 

3. Distance landscape from a reference configuration 

In this section, we consider the entropy landscape from a reference configuration (which 
is not a solution). This entropy counts the number of solutions at a distance Nd from 
the reference configuration J*. The behavior of this entropy landscape refiects the 
geometrical organization of the solution space. Since we concentrate on the ground 
state (i? = 0), we take the inverse temperature /3 — )■ oo and introduce a coupling field x 
to control the distance between solutions and the reference configuration. The partition 
function for this setting is 






■Y.J^J*^ (2) 

where the sum ^ j goes over all possible synaptic weight vectors and ^^ means the sum 
over all variable nodes. Under the definition of the overlap q = jj^ J2i JiJti the partition 
function can be written as 

Z = Y,^^v[N{s{q)+xq)] (3) 

where e^^^^^ is the number of solutions with the overlap q. In the thermodynamic limit 
A^ — )• oo, the saddle point analysis leads to f{x) = -^logZ = max^ [s(g) + xq] where 
f{x) is defined as the free energy density. Therefore, we can determine the entropy s{q) 
by a Legendre transform fiQ[ ITF] 

s{q) = min [f{x) — xq] , (4) 
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g{x) = -^ (5) 

where q is related to d through d = ^-^ and then the entropy density can be expressed as 
a function of the distance d which can be understood as the probabihty that a synaptic 
weight takes different values in J and J* . One recovers the total number of solutions 
by setting a; = in Eq. ([2]). 

3.1. Annealed approximation for s{d) 

We first calculate the annealed entropy density Sann(c^) which serves as the upper 
bound for the true value of the entropy density. Actually, the free energy log Z should 
be averaged over the random input patterns. However, the annealed approximation 
alternatively performs the average of the partition function first and then takes 
the logarithmic operation as /ann = j^ log {Z) where the average is taken over the 
distribution of the random input patterns. This can be computed as 

X 




dq 




i 

exp [N {—qq + xq — a log 2 + log(2 cosh q))] (6) 



27ri/A^ 

where the integral representation of 9(-) is used and the conjugated counterpart q of 
the overlap q is introduce as a Dirac delta function S [q — j^ J2i •Ji^i) is inserted. A 
saddle point analysis results in 

/ann = niax{— gg + xq — a log 2 + log(2 cosh q)} (7) 

where the saddle point equation reads q = x,q = tanhg. Using Eq. (^ and the saddle 
point equation, we get the annealed entropy density 

Sann(c?) = "« log 2 - dlogd - {1 - d) log(l - d) . (8) 

3.2. Replica symmetric computation of s{d) 

The free energy density f{x) is a self-averaging quantity whose value concentrates in 
probability around its expectation in the thermodynamic limit, and its average over 
the random input patterns is very difficult to compute because the logarithm appears 
inside the average. The replica trick resolves this difficulty by using the identity 
logZ = lim„_i.o ^"~^ ■ Then the disorder averaged free energy density can be computed 
by first averaging an integer power of the partition function and then letting n — )■ as 

/= Hm !^. (9) 

To compute (2'"), we introduce n replicated synaptic weight vectors J"'{a = 1,. . . ,n) 
as follows. 
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dqab^^ab 



n 

a<b 



2'ni/N 



exp 



-N J2 1"'^"' + ^« log Goiiq"'}) + Ncar'}) 



a<b 



6 

(10) 



where Gq and Gi are expressed respectively as 



«o({rt)-n[/ff*" 



,iE.A"t»-Ea<6A-Afg"''-iEJA-)2 



(11) 
(12) 



J°:a=l,...,n 

where we have introduced the rephca overlap g"* = -1^ J2i JiJi ^^id its associated 



ab 



Q,Q' 



ab 



q 



conjugated counterpart q . The replica symmetric ansatz assumes q 

for a ^ b. Now using the saddle point analysis, we finally arrive at the formula of the 

free energy density and the corresponding saddle point equations, 



/(i) 



|(,-l) + a/z).logi/(y^ 



+ / Z^^log 



2cosh(A/^-2 + x) 



Dz tanh^ 



qz + X), 



a 



Dz 



G 



z\/H 



1 2 



(13) 
(14) 

1 — q J \\ 1 — q J 

where G{x) = exp(— x^/2)/\/27r and H(x) = j°^ Dz with the Gaussian measure 
Dz = G{z)dz. After the fixed point of the self-consistent equations (1141) and (ITSl) is 
obtained, the entropy landscape s{d) is computed as 



six) 



/(.)-. /B.-ta„h(Vi. + .). 



(16) 



Note that the final expression of s{x) does not depend on the reference configuration 
and the integral in the second term of Eq. flT6l) is q{x) defined in Eq. ([5]). 

3. 3. Message passing equations for single instances 

In this section, we derive the message passing equations to compute the entropy 
landscape for single instances under the replica symmetric ansatz. To derive the self- 
consistent equation, we apply the cavity method [201 E] and first define two kinds of 
cavity probabilities. One is the probability Pil^^ ^^^^^ variable node i in figure [1] (b) takes 
value Ji in the absence of constraint a. The other is p^\.j staying for the probability 
that constraint b is satisfied (pattern /i = 6 is learned) if synaptic weight i takes Ji. 
According to the above definitions, the self- consistent equation for these two kinds of 
probabilities is readily obtained as 



Fil^a = ^e"-''-^* Y[ Pi 



J^ 



■h 
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where Zi^a is a normalization constant, di\a denotes the neighbors of node i except 
constraint a and db\i denotes the neighbors of constraint b except variable node i. 
Eqs. (IT7|) and ( 1T8|) are actually the belief propagation equations 1201 E]. For the binary 
perceptron, directly solving the belief propagation equations is impossible. To reduce 
the computational complexity, we define Wb^i = -4^ '^j^i -h^^j- Note that the sum 
involves A^ — 1 independent random terms, as a result, the central limit theorem implies 
that Wh^i follows a Gaussian distribution with mean iwh^i) and variance al, where 
i^'b^i) = ■^T.j^i^j^^j and al^^^ = ^Ejyi(l - ^])- Within the RS ansatz, the 
clustering property (JiJj) — {Ji) {Jj) ~ for i 7^ j in the thermodynamic limit is used to 
get the variance [9j. itlj = (Jj) is the magnetization in statistical physics language. By 
separating the term -j=Ji^\ from the sum in the 6(-) of Eq. flT8|) . and approximating 
the sum Ylij jedb\i} t'Y an integral over Wb~^i, we get finally 

Pt. = H (-M±^) (19) 

where Wb^i = J2j^9b\i'^j^b^j and ab^, = J2jedb\ii^ " ^]^b) in which the cavity 
magnetization rrij^b = tax^hhj^b- Using Eqs. f lT7|) and f lTI?]) . the cavity field hj^b can 
be obtained in the log-likelihood representation 

1 P^^ 
hj^b = - log ^rr^ = ^J*j + y^i "«^J' (2*^) 



2 p. 



j^b 



-1 



1 V 1 ■ 1 

Wa^i = 7^ log T3T^ = 1^ 




^i+^-A ,.„^f^i 



log if - -^ , -log/7 



w, 



a->j 



'^a^j I \ V ^"^^i 



(21) 



Notice that the cavity bias Ua^j can be approximated by — -7 — . "C^ '' in the large 



N limit. Eqs. ( I20l) and (1211) constitute the recursive equations to compute the free 
energy density in the Bethe approximation [25] 

/(^) = ^E^/^-^E^/- (22) 



A/i = log 



C' + Wb-^i\ , „-xJ* TT TT (^1 - ^b~ 



r I* 

e » 






,(23) 



Af, = logH(-^], (24) 



where A/j = log Zj and A/^ = log Z^ are the free energy shifts due to variable 
node (i) addition (and all its function nodes) and function node (a) addition [16] 
respectively. Actually Zi is the normalization constant of the full probability p^ ' and Za 
the normalization constant of p^ [25]. Wa = ^j^aa^^j^a^j and cxa = ^j^oai^ - "^^^a)- 
Equations (120|) and (!2T|) can be solved by an iterative procedure with a random 
initialization of the corresponding messages. After the iteration converges, the entropy 
landscape s{d) from the fixed reference configuration J* can be computed according to 
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Figure 2. (Color online) Distance landscape from a reference configuration. 

The solid lines are the analytic annealed approximation (Eq. ([8])) for a = 
0.198,0.495,0.693,0.792 (from the top to the bottom) respectively. The horizontal 
dashed line indicates the zero entropy value. The line connecting symbols is a guide to 
the eye. The empty symbols stay for the numerical simulation results on systems with 
{N, P) = (1001, 198), (1001, 495), (1001, 694), (1001, 793) (from the top to the bottom) 
using message passing algorithms. The result is the average over 20 random instances. 
Solid symbols are the replica symmetric results computed numerically by solving the 
saddle point equations. 



the Legendre transform Eqs. (|1]) and (|5]). The computational complexity is of the order 
O^N"^) for this densely connected graphical model, /(x) computed based on Eq. (l22ll 
does not depend on the reference configuration since the change of ^f — )■ —S,\ does not 
affect the final result, consistent with Eq. ( fT3|) . 

The distance landscape from a reference configuration is reported in figure O We 
choose the reference configuration J* = {J* = Ij^^ for simplicity. Other choices of 
the reference configuration still yield the same behavior of the landscape. Note that 
the annealed entropy provides an upper bound for the RS one, and it roughly coincides 
with the RS one at low a (around the maximal point) while the large deviation is 
observed when a further increases. It is clear that most of the solutions concentrate 
around the dominant point where the maximum of the entropy is reached. When the 
given distance is larger or less than certain values {d > d^^^x or d < rfmin), the number of 
solutions at those distances becomes exponentially small in A^. In the intermediate range 
{d G [duiin, dma.x\) , as the distance increases, the number of solutions separated by the 
distance from the reference point in the solution space increases first and then reaches 
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Figure 3. (Color online) Entropy density and typical distance between any two 
solutions as a function of constraint density. The vertical dashed line indicates the 
capacity for the binary perceptron. 



the inaxiinuin which dominates the typical value of the entropy in the original systems 
(by setting x = 0, see figure |3]). The maximum is then followed by a decreasing trend as 
the distance is further increased. This mean field behavior is confirmed by the numerical 
simulations on large-size single random instances using the message passing algorithms 
derived in Sec. 13.31 The consistency between the mean field result obtained by replica 
approach and the simulation result obtained on single random instances is clearly shown 
in figure El The bell shape in figure [2]is similar to that observed in calculating the growth 
rate of expected distance enumerator for a random code ensemble [26] • Note that as 
the constraint density increases, the distance range where solutions exist shrinks, which 
illustrates clearly how the solution space changes as more patterns are presented to the 
binary perceptron. 

We also compute the typical value of the entropy in the original system (by setting 
X = 0) and of the distance between any two solutions as a function of the constraint 
density using the replica method. The result is reported in figure El Here we define 
the typical value of the distance between any two solutions as dj-s = ^-^ where q is 
obtained from the stationary value of Eq. (TT^ . The entropy vanishes at a^ ~ 0.833 
with a finite typical value of distance pQ. This typical distance is also in accordance 
with that computed on single instances by sampling a finite number of solutions jTT] . 
Note that this distance is evaluated here based on the RS ansatz. One can further check 
its stability by the population dynamics on the one-step replica symmetry breaking 
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(IRSB) solution |2S] where we define two typical distances: one is inter-cluster distance 

do = — - — 2 where (•) means the average over clusters and [■] over the disorder, 

and the other is intra-cluster distance defined by di = — '^^^ — [27] where h is the 

local field defined in Eq. fl2Ul) by including all contributions of patterns around the 
synaptic weight (x = 0). In general, solutions within a single cluster are separated by 
a sub-extensive number of synaptic weights while any two clusters are separated by an 
extensive number of synaptic weights. Our numerical simulations confirmed that d^ and 
di will turn out to be identical (equal to drs) after sufficient iterations implying that 
the RS ansatz is unbroken below the capacity. However, for constraint density close 
to the capacity, one needs a much larger sampling interval (by way of the Metropolis 
importance sampling method) |28j in the population dynamics algorithm. 

4. Distance landscape of solution-pairs 

The geometrical property of the solution space can also be studied by counting the 
number of solution-pairs with a predefined distance d, equivalently an overlap of value 
q. Actually this entropy value may be much larger than the entropy density of the 
original problem (which is obtained by setting a; = in Eq. fl22|) ). As we shall present 
later, this case becomes more involved for the binary perceptron with an increasing 
computational cost. 

Considering distance between solutions, we write the partition function as 




5Z n ® I ^^ ^ ^ ^i^i I « I ^^ ;> . ^^^i I exp 



X 



j:jij! 



(25) 

where the coupling field x is used to control the distance between a pair of solutions 
{J^,j'^) and the associated overlap q = jq^iJlJi- This partition function has been 
used to predict optimal coupling field for a multiple random walking strategy to find 
a solution for the perceptronal learning problem [12]. In the following sections, we 
present an annealed computation as well as RS computation of the distance landscape 
s{d). Note that in this setting, Eqs. (E]), dl]) and ^ can also be used but here d should 
be understood as the distance separating two solutions in the weight space. 

4-1. Annealed approximation for s{d) 



Following the same techniques used in Sec. 13.11 we obtain the annealed free energy 
density as (see also Ref. [12] ) 

/ann = Hiax <^ -qq + xq + log(4 cosh q) + a log / DtH , > .(26) 

9.9 y Jo \ ^l-q^j 

The maximization with respect to q and q leads to the following saddle point equation 

q = tanh q, (27) 
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q = x+ ^ ^, (28) 

1 — g^arccot 



where the identity [^ DtH ( ■S^= ] = ;7-arccot ( ■J= ] has been used. Using 

Eq. (j4]) and the above saddle point equation, we get the final expression for Sann{d): 



,{d) = \og2 - (l - d)\og{l - d) - dlogd 



+ a log 



1 I I -2d 

:^arccot 



(29) 



27r \^ 2^d{l-d)^ 

where Sann(O) = (1 — a) log 2 which is actually the annealed entropy density of the 
original system [1]. If a = 0, then ■Sann(O) = log 2 which is in accord with the fact 
that the number of solution-pairs with a distance ci = should be the total number of 
solutions 2^ if no constraints are present. 

4-2. Replica symmetric computation of s{d) 

In this section, we derive the free energy density f{x) for the landscape of solution-pairs 
under the replica symmetric approximation, using the replica trick introduced in Sec. 13.21 
Since the partition function in this case involves a sum of all possible configurations of 
two synaptic weight vectors, the computation becomes a bit complicated. The disorder 
average of the integer power of the partition function can be evaluated as 



|jl,a j2,a| 



l,a j2.ay a<b -^ \ i / \ i J 

xs U' - ^ E j'^'j^"] n / dR'-'s (^"" - ^ E ^^"^^) 

J]0(w;r)e«")\ e"S-> •''"•'''", (30) 



Tl,a r'2,1 

I m m *-~x / I r. 1 1.\ y"< / 1 1.. 1 1. \ \ -"r" \ 



where iff '° = -4^ J2i Ji '"^ii ^^^ ^2'" = -^ Tit Ji '°C^- Under the replica symmetric 
ansatz, the disorder average is carried out as 



X[Q{wr)Q{w^^n) = Dz DyH{y,)H{y: 




-| n 



(31) 



where J Dz = f Dzi f Dz2 J Dt, y, = - "^,X-~n+r ("^ = 1' 2) and we have used 
Qi^ = q^^ = g, r"-'' = r, i?"'^ = R under the RS ansatz. After the computation of the 

summation in Eq. ([30]) by using J2a,b JtJt = ' +^^ ' ^ f--^ ' ^ [i^b i ) ^^^ 

the Hubbard-stratonovich transform, we get the replica symmetric free energy density 

f{x) =a Dz\ogFi{q,R,r)+xR + q{q-l) + -rf-RR+ / D£logF2(g, ^f), (32) 
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where J Dz = J Dz, J Dz^ J Dz^, F^{q,R,r) = jDyH{y ^)H{y2) and F^iq^R^r) = 
2e"^ cosh(ai + a2) + 2e^"» cosh(ai — 02) where a^ = ^Jq — f/2zx + ^Jr/2z'i{x = 1, 2), 03 = 
R — f/2. The RS order parameters (g, R, r, q, R, f) are determined by the following self- 
consistent equations 

''-''^1-q-R + rJ ''''jDyHiy,)Hiy,y ^''^ 

2a f ^ jDyG{y,)H{y,) f DyG{y2)H{y,) ^^^^ 



1-q-R + rJ [J DyHiy,)H{y., 

^Dy[G{y,)H{y2)-G{y2)H{yr)] 



2 
r a 



+ —, r / Dz 

2 2(1- q-R + r 



j DyH{y,)H{y2) 



(35) 



, tanh 03 + tanh ai tanh 02 , , 

R = / Dz- , (36) 

1 + tanh 03 tanh ai tanh 02 



tciiiiiu,3i^tciiiii u,i -r tdiiii U2; -r i^ciiiiiui tciiiiiu,2i,x -r udiiii 1x3; 
q =r+ I Dz ^^_^ ^ ^_^ ^ ^_^ ^ ^ .^ ^ . (38) 



^ ^tanha3(tanh^ ai + tanh^ 02) + tanhoi tanh 02(1 + tanh^ 03) 
(1 + tanh 03 tanh Oi tanh 02) 
(tanh 03 — l)^(tanhai —tanh 02)^ 
2(1 + tanh 03 tanh ai tanh 02) 
In the derivation of the above saddle point equations, we have used a useful property 
of the Gaussian measure J DzzF{z) = J DzF'{z) where F'{z) is the derivative of the 
function F{z) with respect to z. After the fixed point of the above saddle point equations 
is obtained, one can compute the entropy density s = f{x) —xR with d = ^-^. Note that 
R — r may become negative, in this case we replace R and r by —R and — r respectively, 
y by —y only for 1/2 in Eqs. (l33l) to ( l35ll . 



4-3. Message passing equations for single instances 

Jl j2 

By analogy with definitions in Sec. 13. 3[ we define by Pil^^ ^^^ probability that the 
synaptic weight i takes a two-component vector state (J/, Jf) in the absence of constraint 

jl j2 

a and by p^i^/ the probability that constraint b is satisfied given the vector state (J/, Jf) 
of weight i. These two cavity probabilities obey the following recursive equations 

Af = -T^e^'^'^ TT p£;f ' (39) 







^E^e n^^?' (40) 



j£db\i 



where Zi^a is a normalization constant. In fact the belief propagation equations fl39l) 
and f HU]) correspond to the stationary point of the Bethe free energy function of the 
current system [291 EH]- The exchange of J^ and J^ does not change the partition 
function Eq. ( 125|) . thus the cavity probabilities have the property that p^'~ = Pi^'a 
and p'^_^~ = P^^^ ■ This symmetry property will simplify the following analysis a lot. 
To simplify Eqs. f l39|) and (HOj) . we need the joint distribution of wl^^ and wl^^ 
where wl^^ = -^ ^^.^. J|^) and wj^^ = -^ ^^._^. Jf^'^. Since we impose a distance 
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constraint upon two solutions J^ and J^ in Eq. (123]) . there exists correlation between 
these two normally distributed random numbers and this correlation is characterized by 
the correlation coefficient 



Pb- 



E 



jedb\iW^b ^j^b) 



(Jb- 



(41) 



due to the symmetry property. Based on Eq. fl39l) . messages qj^h and rrij^iy are 
determined respectively by the following equations, 



^3^b 



1 ia<^dj\b Pa^j 



\.\.a<^aj\bPa4j 2e ^ iia&dj\bP. 



+1,-1 



1,-1 



1 laeai\b Pa-4-j + 1 \.a&dj\b Pa^j 



+ 2e-- n 



-+1,-1 

aedj\b Pa^j 



m 



j^b 



lla&dj\bPa^j 



— 1,-1 
aedj\bPa^j 



I --1,-1 

laedj\bPa^j 



+ 2e-^ n 



-+1,-1 

aGdj\bPa-^j 



(42) 



(43) 



Therefore, both ti'^^j and wl^j^ obey a bivariate normal distribution and Pi,!^^' is reduced 
to be 



Jl J2 

Pb'4i' 






DtH 



a/O" 



pUi)(^b^i 



PU^. 



(44) 



where Wb^i = T.jedb\i^j^b^j and ab^i = Eje9bv(l - "^?^J 



The overlap g is 



determined by q{x) = jj: J2i li where qi is given by 



Qi 



IfeeSi Pb^i 



+ 1 ib&di Pb-^i \ 



2'3 "^ 1 Iftg^j Pb^i 



^^ W ib&di Pb-^i + 1 h&di Pb^i J + 2e ^ [ l^gg. P5_^'j 
Eq. (jHj) is more computationally demanding than Eq. flT9|) since an additional 
numerical integral is required to compute p here. However, the integral in Eq. (jHj) 
can be approximated by cqH ( ^-^ j if we write the right hand side of Eq. f H4|) as 
/^°° /)ieiog^(a-M) g^j^^ expand H {a — bt) up to the second order in bt. The constants cq, 
Ci and C2 can be determined as a function of a and b. Therefore, this approximation is 
accurate only when large bt has vanishing contribution to the integral. 

The free energy shift due to variable node addition (and all its adjacent constraints) 
can be obtained as A/j = log Zi and the free energy shift due to constraint addition 
Afa = log Za where 



UPt 



+i,+i 



Za 



bedi 

oo 



DtH 






where Wa = T.j&da'^j-^ai'-. Oa = Eieaa(l - ^j-^a) and Pa 




(46) 



(47) 



The free 

energy density can then be obtained using Eq. fl2^ and the entropy landscape can be 
obtained correspondingly. The recursive equations Eqs. (HTl) . (I42l) . (H3l) and (144|) can be 
solved by an iterative procedure similar to that used in Sec. 13.31 
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CO 




Figure 4. (Color online) Distance landscape of solution-pairs with a predefined 
distance d. The solid lines are the analytic annealed approximation (Eq. ^^) for 
a = 0.198,0.495,0.693,0.792 (from the top to the bottom) respectively. The line 
connecting symbols is a guide to the eye. The empty symbols stay for the numerical 
simulation results on systems with {N, P) = (501, 99), (501, 248), (501, 347), (501, 397) 
(from the top to the bottom) using message passing algorithms. The result is the 
average over 20 random instances. Solid symbols are the replica symmetric results 
computed numerically by solving the saddle point equations. 



As is seen from figure HI the entropy density increases smoothly until a maximum 
is reached for a = 0.198 and then decreases as the distance further grows. Interestingly, 
this behavior can be well fitted by the annealed approximation keeping the concavity 
of the entropy function. However, as a increases, large deviation from the annealed 
approximation occurs. The mean field calculations are supported by the numerical 
simulations on single instances using the proposed message passing algorithms, as 
shown in figure HI As a increases, the message passing algorithm requires a large 
number of iteration steps to converge (especially at small distances) and additionally a 
computationally expensive Monte Carlo integral involved in Eq. (jUJ can not be avoided. 
On the other hand, when a is close to the capacity, the typical value of the entropy 
for small non-zero distance (corresponding to large positive coupling field) becomes 
hard to evaluate since the order parameter R easily evolves towards 1. The distance 
corresponding to the maximum of the entropy landscape curve in figure H] is actually 
the typical distance drs calculated in figure |3l and s{d = 0) recovers the typical entropy 
density of the original problem. By taking the limit i? — )■ 1 in Eq. (l32l) . one can 
show that s{d = 0) = f{x = 0) where /(x) is given by Eq. flT3|) . As the constraint 
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density increases, the maximal point of the entropy curve moves to the left, however, 
solution-pairs still maintain a relatively broad distribution in the solution space when 
a approaches as, e.g., dmax = argmax^{s((i) = 0} ~ 0.332 at a = 0.82 (c/rs — 0.222), 
which may be responsible for the algorithmic hardness in this region. 

5. Discussion and Conclusion 

The typical property of the distance landscape either from a reference configuration 
or for pairs of solutions is studied. For the first distance landscape, as the distance 
increases, the number of associated solutions grows first and then reaches its maximum 
(dominating the typical value of the entropy in the original system) followed by a 
decreasing behavior. This typical trend is confirmed by the numerical simulations on 
single instances using the proposed message passing algorithms. This behavior suggests 
that most of the solutions concentrate around the dominant point (the maximum in 
the distance landscape) in the A^-dimensional weight space. It is clear that as the 
constraint density increases, the distance landscape shows larger and larger deviation 
from the analytic annealed approximation. We also calculate the second distance 
landscape characterizing the number of solution-pairs separated by a given distance. 
In this case, the replica symmetric result is in good agreement with the annealed 
computation at low a, while the large deviation is observed between the replica 
symmetric approximation and annealed computation for high a. Both landscapes are 
evaluated in the thermodynamic limit and confirmed by message passing simulations on 
large-size single instances. 

In this paper, we calculate the whole picture of the distance (entropy) landscape 
and show that the entropy value rises to a maximum before declining at higher values 
of distance. However, as observed in the single instances studies by the local search 
heuristics working by single- or double-weight flips [11] or by cooperative random 
walkings [12] , the connection pattern in the solution space becomes quite heterogeneous 
for large constraint density given the finite problem size. This suggests that very 
narrow corridors (namely entropic bridges [311 [321 [17] ) niay connect different components 
leading to the entropic trapping for local search heuristics or energetic barriers 
(connected components are sub-extensively separated) are present in the solution space. 
Here we define the connected component in the weight space as a cluster of solutions in 
which any two solutions are connected by a path of successive single- weight flips [H [11] . 
The connected components are separated by a sub-extensive distance k{N) whose 
dependence on A^ is still unclear and very difficult to determine [29]. Solutions in 
different components are fc-fiip stable (i.e., the solution in one component after flips of 
at most k synaptic weights is still a solution but in another component). For any finite 
systems, k is an increasing function of A^ with the property lim7v-s.oo jy = [331 ISSl [M] . 
In fact, below the capacity, the solution space does not shatter into exponentially many 
well-separated clusters, supported by our entropy landscape computation in this paper. 
However, the presence of many disjoint components in the solution space, as observed in 
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the single instances studies [HI 112], may account for the slowing down of any stochastic 
local search heuristics. Both energetic and entropic barriers may coexist in the solution 
landscape and the energetic barriers prohibit the random walking process from finding a 
solution or a very long waiting time is required due to the entropic barriers p!T| [T2] . This 
will become much evident when the constraint density gets close to the capacity, since 
the total number of solutions becomes very small while solution-pairs still maintain a 
relatively broad distribution. 

The distance landscape evaluated here is very similar to the weight enumerator in 
coding theory [35] and the method can be extended to consider the landscape analysis 
for low-density parity-check codes or code-division multiple access multiuser detection 
problems [211 EH] , which will help to clarify what role the distance landscape plays with 
respect to the decoding performance. 
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